7 research outputs found

    Multiple Access Channels with Combined Cooperation and Partial Cribbing

    Full text link
    In this paper we study the multiple access channel (MAC) with combined cooperation and partial cribbing and characterize its capacity region. Cooperation means that the two encoders send a message to one another via a rate-limited link prior to transmission, while partial cribbing means that each of the two encoders obtains a deterministic function of the other encoder's output with or without delay. Prior work in this field dealt separately with cooperation and partial cribbing. However, by combining these two methods we can achieve significantly higher rates. Remarkably, the capacity region does not require an additional auxiliary random variable (RV) since the purpose of both cooperation and partial cribbing is to generate a common message between the encoders. In the proof we combine methods of block Markov coding, backward decoding, double rate-splitting, and joint typicality decoding. Furthermore, we present the Gaussian MAC with combined one-sided cooperation and quantized cribbing. For this model, we give an achievability scheme that shows how many cooperation or quantization bits are required in order to achieve a Gaussian MAC with full cooperation/cribbing capacity region. After establishing our main results, we consider two cases where only one auxiliary RV is needed. The first is a rate distortion dual setting for the MAC with a common message, a private message and combined cooperation and cribbing. The second is a state-dependent MAC with cooperation, where the state is known at a partially cribbing encoder and at the decoder. However, there are cases where more than one auxiliary RV is needed, e.g., when the cooperation and cribbing are not used for the same purposes. We present a MAC with an action-dependent state, where the action is based on the cooperation but not on the cribbing. Therefore, in this case more than one auxiliary RV is needed

    Towards Optimal Compression: Joint Pruning and Quantization

    Full text link
    Compression of deep neural networks has become a necessary stage for optimizing model inference on resource-constrained hardware. This paper presents FITCompress, a method for unifying layer-wise mixed precision quantization and pruning under a single heuristic, as an alternative to neural architecture search and Bayesian-based techniques. FITCompress combines the Fisher Information Metric, and path planning through compression space, to pick optimal configurations given size and operation constraints with single-shot fine-tuning. Experiments on ImageNet validate the method and show that our approach yields a better trade-off between accuracy and efficiency when compared to the baselines. Besides computer vision benchmarks, we experiment with the BERT model on a language understanding task, paving the way towards its optimal compression

    FBM: Fast-Bit Allocation for Mixed-Precision Quantization

    Full text link
    Quantized neural networks are well known for reducing latency, power consumption, and model size without significant degradation in accuracy, making them highly applicable for systems with limited resources and low power requirements. Mixed precision quantization offers better utilization of customized hardware that supports arithmetic operations at different bitwidths. Existing mixed-precision schemes rely on having a high exploration space, resulting in a large carbon footprint. In addition, these bit allocation strategies mostly induce constraints on the model size rather than utilizing the performance of neural network deployment on specific hardware. Our work proposes Fast-Bit Allocation for Mixed-Precision Quantization (FBM), which finds an optimal bitwidth allocation by measuring desired behaviors through a simulation of a specific device, or even on a physical one. While dynamic transitions of bit allocation in mixed precision quantization with ultra-low bitwidth are known to suffer from performance degradation, we present a fast recovery solution from such transitions. A comprehensive evaluation of the proposed method on CIFAR-10 and ImageNet demonstrates our method's superiority over current state-of-the-art schemes in terms of the trade-off between neural network accuracy and hardware efficiency. Our source code, experimental settings and quantized models are available at https://github.com/RamorayDrake/FBM

    Jet Single Shot Detection

    No full text
    We apply object detection techniques based on Convolutional Neural Networks to jet reconstruction and identification at the CERN Large Hadron Collider. In particular, we focus on CaloJet reconstruction, representing each event as an image composed of calorimeter cells and using a Single Shot Detection network, called Jet-SSD. The model performs simultaneous localization and classification and additional regression tasks to measure jet features. We investigate Ternary Weight Networks with weights constrained to {-1, 0, 1} times a layer- and channel-dependent scaling factors. We show that the quantized version of the network closely matches the performance of its full-precision equivalent

    Lightweight jet reconstruction and identification as an object detection task

    No full text
    We apply object detection techniques based on deep convolutional blocks to end-to-end jet identification and reconstruction tasks encountered at the CERN large hadron collider (LHC). Collision events produced at the LHC and represented as an image composed of calorimeter and tracker cells are given as an input to a Single Shot Detection network. The algorithm, named PFJet-SSD performs simultaneous localization, classification and regression tasks to cluster jets and reconstruct their features. This all-in-one single feed-forward pass gives advantages in terms of execution time and an improved accuracy w.r.t. traditional rule-based methods. A further gain is obtained from network slimming, homogeneous quantization, and optimized runtime for meeting memory and latency constraints of a typical real-time processing environment. We experiment with 8-bit and ternary quantization, benchmarking their accuracy and inference latency against a single-precision floating-point. We show that the ternary network closely matches the performance of its full-precision equivalent and outperforms the state-of-the-art rule-based algorithm. Finally, we report the inference latency on different hardware platforms and discuss future applications.ISSN:2632-215
    corecore